Reward Shaping for Model-Based Bayesian Reinforcement Learning
نویسندگان
چکیده
Bayesian reinforcement learning (BRL) provides a formal framework for optimal exploration-exploitation tradeoff in reinforcement learning. Unfortunately, it is generally intractable to find the Bayes-optimal behavior except for restricted cases. As a consequence, many BRL algorithms, model-based approaches in particular, rely on approximated models or real-time search methods. In this paper, we present potential-based shaping for improving the learning performance in model-based BRL. We propose a number of potential functions that are particularly well suited for BRL, and are domainindependent in the sense that they do not require any prior knowledge about the actual environment. By incorporating the potential function into real-time heuristic search, we show that we can significantly improve the learning performance in standard benchmark domains.
منابع مشابه
Reward Shaping in Episodic Reinforcement Learning
Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of reinforcement learning in various sectors, such as healthcare and cyber-security, among others. However, reinforcement learning can be time-consuming be...
متن کاملKnowledge and Ignorance in Reinforcement Learning By
The field of Reinforcement Learning is concerned with teaching agents to take optimal decisions to maximize their total utility in complicated environments. A Reinforcement Learning problem, generally described by the Markov Decision Process formalism, has several complex interacting components, unlike in other machine learning settings. I distinguish three: the state-space/ transition model, t...
متن کاملReward Shaping for Statistical Optimisation of Dialogue Management
This paper investigates the impact of reward shaping on a reinforcement learning-based spoken dialogue system’s learning. A diffuse reward function gives a reward after each transition between two dialogue states. A sparse function only gives a reward at the end of the dialogue. Reward shaping consists of learning a diffuse function without modifying the optimal policy compared to a sparse one....
متن کاملLearning from Demonstration for Shaping through Inverse Reinforcement Learning
Model-free episodic reinforcement learning problems define the environment reward with functions that often provide only sparse information throughout the task. Consequently, agents are not given enough feedback about the fitness of their actions until the task ends with success or failure. Previous work addresses this problem with reward shaping. In this paper we introduce a novel approach to ...
متن کاملLearning Shaping Rewards in Model-based Reinforcement Learning
Potential-based reward shaping has been shown to be a powerful method to improve the convergence rate of reinforcement learning agents. It is a flexible technique to incorporate background knowledge into temporal-difference learning in a principled way. However, the question remains how to compute the potential which is used to shape the reward that is given to the learning agent. In this paper...
متن کامل